The Bellman Data Quality Browser

نویسنده

  • Divesh Srivastava
چکیده

Keynote Talk Abstract Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns. Commonly encountered problems include missing data (null values), duplicates and default values in columns supposed to treated as keys, data inconsistencies (violation of functional dependencies), and poor quality join paths (lack of referential integrity). Compounding the data quality problems are incomplete and out-of-date metadata about the database and the processes used to populate the database. These problems make the task of analyzing data particularly challenging. To effectively address such problems, we have built the Bellman data quality browser at AT&T. Bellman profiles the database and computes concise statistical summaries of the contents of the database, to identify approximate keys, frequent values of a field (often default values), joinable fields with estimates of join sizes paths, and to understand database dynamics (changes in a database over time). In this talk, I'll describe the technology underlying Bellman and how it is used to help make sense of complex databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bellman : A Data Quality Browser

When a data analyst starts a new project, she is often presented with one or more very large databases (containing hundreds or thousands of tables). Extracting useful information from the databases can be a difficult problem: documentation is usually minimal, the data is poorly structured and difficult to join, and the quality of the data is often poor. As an aid in exploratory analysis, we are...

متن کامل

APPLICATION OF THE BELLMAN AND ZADEH'S PRINCIPLE FOR IDENTIFYING THE FUZZY DECISION IN A NETWORK WITH INTERMEDIATE STORAGE

In most of the real-life applications we deal with the problem of transporting some special fruits, as banana, which has particular production and distribution processes. In this paper we restrict our attention to formulating and solving a new bi-criterion problem on a network in which in addition to minimizing the traversing costs, admissibility of the quality level of fruits is a main objecti...

متن کامل

طراحی وب سرویس مدیریت امدادرسانی پس از وقوع سیل با کمک اطلاعات جغرافیایی داوطلبانه (VGI) بر مبنای تکنولوژی متن باز

Accessibility to precise spatial and real time data plays a valuable role in the velocity and quality of flood relief operation and subsequently, scales the human and financial losses down. Flood real time data collection and processing, for instance, precise location and situation of flood victims may be a big challenge in Iran regarding the hardware facilities (such as high resolution aerial ...

متن کامل

Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures

MOTIVATION Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visuali...

متن کامل

Internet QoS Routing Using the Bellman-Ford Algorithm

Multimedia applications are Quality of Service (QoS) sensitive, which makes QoS support indispensable in high speed Integrated Services Packet Networks (ISPN). An important aspect is QoS routing, namely, the provision of QoS routes at session set up time based on user request and information about available network resources. This paper develops optimal QoS routing algorithms within an Autonomo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006